Clustered affinity scheduling on large-scale NUMA multiprocessors
نویسندگان
چکیده
Modern shared-memory multiprocessors have high and non-uniform memory access (NUMA) costs. The communication cost gradually dominates the source of parallel applications’ execution. Algorithms based on affinity, like affinity scheduling algorithm (AFS), perform better than dynamic algorithms, such as guided self-scheduling (GSS) and trapezoid selfscheduling (TSS). However, as the number of processors increases, AFS suffers heavy overheads for migrating workload. The overheads include remote reads to the queues for the indices information, synchronous writes to the queues for migrating iterations, and the time in loading data into cache. In this paper, we propose a new loop scheduling algorithm, clustered affinity scheduling (CAFS), to improve affinity scheduling algorithm. We distribute the processors into several clusters, and cluster-based migrations are carried on when imbalance occurs. We confirm our idea by running many applications under a realistic hierarchy memory simulator. Our results show that CAFS reduces at least l/3 of both remote reads and synchronous writes to the queues under most applica-
منابع مشابه
Hierarchical loop scheduling for clustered NUMA machines
Loop scheduling is an important issue in the development of high performance multiprocessors. As modern multiprocessors have high and non-uniform memory access (NUMA) costs, the communication costs dominate the execution of parallel programs. Previous anity algorithms perform better than dynamic algorithms under non-clustered NUMA multiprocessors, but they suer heavy overheads when migrating ...
متن کاملMultiprogrammed Parallel Application Scheduling in NUMA Multiprocessors
The invention, acceptance, and proliferation of multiprocessors are primarily a result of the quest to increase computer system performance. The most promising features of multiprocessors are their potential to solve problems faster than previously possible and to solve larger problems than previously possible. Large-scale multiprocessors offer the additional advantage of being able to execute ...
متن کاملExperiences with Data Distribution on NUMA Shared Memory Multiprocessors
The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...
متن کاملProcessor Pool - Based Schedulingfor Large - Scale NUMA
Large-scale Non-Uniform Memory Access (NUMA) multiprocessors are gaining increased attention due to their potential for achieving high performance through the replication of relatively simple components. Because of the complexity of such systems, scheduling algorithms for parallel applications are crucial in realizing the performance potential of these systems. In particular, scheduling methods...
متن کاملOn the Importance of Parallel Application Placement in NUMA Multiprocessors
The thesis of this paper is that scheduling decisions in large-scale, sharedmemory, NUMA (Non-Uniform Memory Access) multiprocessors must consider not only how many processors, but also which processors to allocate to each application. We call the problem of assigning parallel processes of an application to processors application placement. We explore the importance of placement decisions by me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Systems and Software
دوره 39 شماره
صفحات -
تاریخ انتشار 1997